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Abstract 



Nuclear norm minimization (NNM) has recently gained significant attention for its use in rank 
I minimization problems. Similar to compressed sensing, using null space characterizations, recovery 

thresholds for NNM have been studied in [TH H] . However simulations show that the thresholds are 
I far from optimal, especially in the low rank region. In this paper we apply the recent analysis of 

Stojnic for compressed sensing to the null space conditions of NNM. The resulting thresholds are 
I significantly better and in particular our weak threshold appears to match with simulation results. 

Further our curves suggest for any rank growing linearly with matrix size n we need only three times 
I of oversampling (the model complexity) for weak recovery. Similar to |12j we analyze the conditions 

for weak, sectional and strong thresholds. Additionally a separate analysis is given for special case 
I of positive semidefinite matrices. We conclude by discussing simulation results and future research 

directions. 

■ 1 Introduction 

Rank minimization (RM) addresses the recovery of a low rank matrix from a set of linear measurements 
that project the matrix onto a lower dimensional space. The problem has gained extensive attention in 
', the past few years, due to the promising applicability in many practical problems [ij. Suppose that Xq 
is a low rank matrix of size ni x n2 and let rank(X) = r. Further let A : M"i^"2 _^ ^ linear 

measurement operator. Given the measurements yo = -^(^o)) the problem is to recover Xq, with the 
knowledge of the fact that it is low rank. Provided that Xq is the solution with lowest rank, this problem 
can be formulated with the following minimization program. 



The rank(-) function is non-convex, and it turns out that ([T|) is NP hard and cannot be solved effi- 
ciently. Fazel et al. suggested replacing the rank with the nuclear norm heuristic as the closest convex 
relaxation [1]. The resulting convex optimization program is called nuclear norm minimization and is as 
follows. 



*This work was supported in part by the National Science Foundation under grants CCF-0729203, CNS-0932428 and 
CCF-1018927, by the Office of Naval Research under the MURI grant N00014-08-1-0747, and by Caltech's Lee Center for 
Advanced Networking. 





min \\X\\^ 
subject to 

A{X) = A{Xq) 



(2) 



where || • ||* refers to the nuclear norm of its argument, i.e., the sum of the singular values. ([2]) can be 
written as a semi-definite program (SDP) and thus be solved in polynomial time. Recent works have 
studied the sufficient conditions under which ([2|) will recover Xq (i.e. Xq is unique minimizer of ([2])). 
In [3] it is shown that, similar to compressed sensing. Restricted Isometry Property (RIP) is a sufficient 
condition for the success of ([2]) and 0{rnin2{ni + n2)log{nin2)) measurement is enough for guaranteeing 
RIP with high probability. In [T9], Candes extended these results and showed that a minimal sampling of 
0{rn) is in fact enough to have RIP and hence recovery. In later works |3l[l2], necessary and sufficient null 
space conditions are derived and were analyzed for Gaussian measurement operators, i.e., operators where 
the entries are i.i.d. Gaussian, leading to thresholds for the success of These thresholds establish 
explicit relationships between the problem parameters, as opposed to the order-wise relationships that 
result from RIP techniques. However these results are far from being optimal in the low rank regime 
which necessitates a new approach to be taken. In particular, if the matrix size is n x n and the rank 
of the matrix to be recovered is j3n then even if /3 > is very small, they require a minimum sampling 
of (1 — ^^)n^ for success. In this paper, we come up with a novel null space analysis for the rank 
minimization problem and we find significantly better thresholds than the results of [U [12] . Although 
the analysis is novel for the rank minimization problem, we basically follow the analysis developed for 
compressed sensing by Stojnic in [18j which is based on a seminal result of Gordon [15j. In addition to the 
analysis of general matrices, we give a separate analysis for positive semidefinite matrices which resemble 
nonnegative vectors in compressed sensing. We also consider the case of unique positive semidefinite 
solutions, which was recently analyzed by Xu in |23j . 

We extensively use the results of [18] . Basically, we slightly modify Lemmas 2, 5, 7 of [H] and use 
null space conditions for the NNM problem. The strength of this analysis comes from the facts that the 
analysis is more accessible and that the weak threshold of [18] matches the exact threshold of [7]. In 
fact, while it is not at all clear how to extend the analysis of [7j from compressed sensing to NNM, it 
is relatively straightforward to do so for [18j. Our simulation results also indicate that our thresholds 
for the NNM problem are seemingly tight. This is perhaps not surprising since, as we shall see, the null 
space conditions for NNM and compressed sensing are very similar. 



2 Basic Definitions and Notations 

Denote identity matrix of size n x n by We call U G ^^^^ partial unitary if columns of U form an 
orthonormal set i.e. U'^U = In- Clearly we need n < m for U to be partial unitary. Also for a partial 
unitary U, let U denote an arbitrary partial unitary of size m x (m — n) so that [U U] is a unitary matrix 
(i.e. columns are complete orthonormal basis of M"*). 

For a matrix X £ M™-^'^, we denote the singular values by o"i(X) > a2{X) > ••• > crq{X) where 
q = min(m, n). The (skinny) singular value decomposition (SVD) of X is shown as X = Ux'^xyjc where 
Ux G M™^^ Sx G W'' and Vx G M"^*", where r = rank(X). Note that Ux,Vx are partial unitary 
and Sx is positive, diagonal and full rank. Also let T,(X) denote vector of increasingly ordered singular 
values of X i.e. S(X) = [(Jq{X) . . . (Ji{X)f . 

The Ky-Fan k norm of X denoted by ||^||a; is defined as ||^||fc = ^i=i <^i{X). When k = min(m, n) 
it is called the nuclear norm, i.e. ||^||*, and when A; = 1 it is equivalent to the spectral norm denoted by 

||X||. Also Frobenius norm is denoted by = \/ {X, X) = JY17=i ^"ii-^)- Note that we always have: 



k k 

J;iJ;<t2(X)<^/^||X||^ (3) 

1=1 i=l 

For a linear operator A{-) acting on a linear space, we denote the null space of A by AA(^), i.e. 
W G M{A) iff .4(W^) = 0. We denote by ^(^1,^2) the ensemble of real di x d2 matrices in which the 
entries are i.i.d. A/'(0, 1) (zero- mean, unit variance Gaussian). 



IXI 



< 



\ 



2 



It is a well known fact that normalized singular values of a square matrix with i.i.d. Gaussian entries 
have quarter circle distribution asymptotically [2]. In other words the histogram of singular values 
(normalized by l/^/n) converges to the function 



< X < 2 



TT 



(4) 



Similarly, the distribution of the squares of the singular values (normalized by 1 /n) converges to the well 
known Marcenko-Pastur distribution [2j • Note that this is nothing but the distribution of the eigenvalues 
of X-^X where X is a square matrix drawn from Q{n,n), 



4>2{x) 



2'nx 



< X < 4 



Let F{x) be the cumulative distribution function of (/)(x) i.e., 



F(x) 



(l){t)dt 



(5) 



(6) 



Let < /? < 1. We define 7(/3) to be the asymptotic normalized expected value of the Ky-Fan /3n norm 
of a matrix drawn from Q(n,n), i.e.: 



7(/3) := lim 



IE[||^||,3n] 



n— >oo fi 



3/2 



x(j){x)dx 



(7) 



Similarly define 72 (/3) to be the asymptotic normalized expected value of the Ky-Fan f3n norm of a matrix 
X^X where X is drawn from Q[n,n): 



72 (/3) = lim E 

n— >oo 





= lim E 




n—>-co 



x^(j){x)dx 



(8) 



Note that these limits exist and 7(/3),72(/3) is well defined [T7]. 

A function / : M" — t- M is called L-Lipschitz if for all x,y we have: |/(x) — f{y)\ < L\\x — y\\i2 

We say an orthogonal projection pair {P, Q} is a support of the matrix X \i X = PXQ^. In 

particular {Px,Qx} is the unique support of the matrix X, if Px and Qx are orthogonal projectors 

with rank(Px) = rank((5x) = rank(X) such that X = PxXQ^. In other words, Px = UxU 

Qx = VxV^. 



X and 



We say A : 



tiriixn2 



is a random Gaussian measurement operator if the i^^ measurement is 



Ui = A{X)i = tTace{GfX) where {Gi}^;^'s are i.i.d. matrices drawn from Q{ni,n2) for all 1 < i < m. 
Note that this is equivalent to yi = vec(Gj)^vec(X) where vec(X) is obtained by putting columns of X 
on top of each other to get a vector of size nin2 x 1. 

Model complexity is defined as the number of degrees of freedom of the matrix. For a matrix of size 
an X n and rank af3n model complexity is a/3(l + a — a/3 + o(l))n^. Then we define normalized model 
complexity to he 9 = f3{l + a — q/3). 

Finally let ^ denote "greater than" in partially ordered sets. In particular if A,B are Hermitian 
matrices then B <^=^ A — B is positive semidefinite. Similarly for a given two vectors u, v we write 
u ^ V <^=^ Uj > Vj V i. 



3 Key Lemmas to be Used 

In this section, we state several lemmas that we will make use of later. Proofs that are omitted can be 
found in the given references. 

For Lemmas (P) , ([2]) , ([3]) , let X, y, Z G M*"^" with m < n. 
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Lemma 1. 

m 

tr(x^y) < 'yi{x)<Ji{Y) = s(x)^s(y) (9) 

i=l 

Proof. Can be found in [20]. ■ 

In case of vectors (i.e. matrices are diagonal) we have the following simple extension: Let x, y G 
he vectors. Let X[j] he i'th largest value of vector |x| (i.e. |x|j = |xj|j then 

m 

(x,y) < (10) 

i=l 

Lemma 2. Let Z = X - Y. Let Si{X,Y) = \ai{X) - ai{Y)\ and let sm{X,Y) > S[2]{X,Y) > ••• > 
s^mji^i ^) be a decreasingly ordered arrangement of{si{X, Y)}^!^. Then we have the following inequality: 

k k 

Vm>A;>l: y) < = ||Z|U (11) 

i=l i=l 

In particular we have: 

m 

Y\ai{X) - ai{Y)\ < \\Z\U (12) 

Proof. Proof can be found in [161 121j ■ 
Lemma 3. If matrix X 



Xu Xi2 

X21 X22 



then we have: 

II^IU > IIX11IU + IIX22IU (13) 



Proof. Proof can be found in [4J. ■ 

Similarly, we have the following obvious inequality when X is square (m = n): 

\\X\U > trace{X) (14) 
Proof. Dual norm of the nuclear norm is the spectral norm |16j . Remember that Im is identity. Then: 



= sup (X,y) > = trace(X) (15) 

lii'll=i 



Theorem 1. (Escape through a mesh, fl5^) Let S he a suhset of the unit Euclidean sphere S^~^ in M". 
Let Y he a random (n — m)- dimensional suhspace o/M", distributed uniformly in the Grassmanian with 
respect to Haar measure. Let 

w(5) =Esup(h^w) (16) 
where h is a column vector drawn from G{n, 1). Then if uj{S) < y/m — we have: 

P(yn5 = 0) > 1-3.5 exp(-^ L- (17) 

18 
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Lemma 4. For all 1 < k < n, (Jk{X) is a 1-Lipschitz function of X. 

Proof. Let X,X be such that X = X — X and ||^||f ^ 1- But then from Lemma ([2]) we have: 

1 > II^IIf > <yi{X) > S[^iX,X) > \akiX) - ak{X)\ (18) 



Lemma 5. (from Q \13f ) Let x be drawn from G{n, 1) and f : M" — )■ M 6e a function with Lipschitz 
constant L then we have the following concentration inequality 

P(|/(x)-E/(x)| >t) <2exp(-^) (19) 

For analyzing positive semidefinite matrices, we wih introduce some more definitions and lemmas 
later on. 

4 Thresholds for Square Matrices 

In the following section, we'll give and analyze strong, sectional and weak null space conditions for square 
matrices (M"^"). With minor modifications, one can obtain the equivalent results for rectangular matrices 

4.1 Strong Threshold 

Strong recovery threshold. Let A : M"^" — >• M™' be a random Gaussian operator. We define 13 
(0 < (3 < 1) to be the strong recovery threshold if with high probability A satisfies the following property: 
Any matrix X with rank at most j3n can be recovered from measurements A{X) via (0). 

Lemma 6. Using (0) one can recover all matrices X of rank at most r if and only if for all W G J\f{A) 
we have 

2\\W\\r < \\W\\^ (20) 
Proof. If ()20p holds then using Lemma ([2]) and the fact that cTj(X) =0 \/ i > r for any W we have 

n r n 

\\X + W\U>^\ai{X)-ai{W)\ > ^{aiiX) - ai{W)) + ^ ai{W) (21) 

i=l i=l i=r+l 

> \\X\\^ + \\W\U-2\\W\\r > \\X\\^ (22) 

Hence X is unique minimizer of ([2]). Conversely if ()20p doesn't hold for some W then choose X = —Wk 
where Wk is the matrix induced by setting all but largest r singular values of W to 0. Then we get: 
\\X + W\\ = Er=r+i ^i(W^) > ELi ^i(^) = ELi '^i{X) = ll^lk- Finally we find rank(X) < r but X is 
not the unique minimizer. ■ 

Now we can start analyzing the strong null space condition for the NNM problem. ^ is a random 
Gaussian operator and we'll analyze the linear regime where m = jin^ and r = fin. Our aim is to 
determine the least /U (1 > > 0) so that /3 is a strong threshold for A. Similar to compressed sensing 
the null space of ^ is an — m dimensional random subspace of M" distributed uniformly in the 
Grassmanian w.r.t. Haar measure. This can also be viewed as the span of M = (1 — /u)n^ matrices 
{Gi}^^ drawn i.i.d. from Q{n,n). Then similar to |18j we have established the necessary framework. 

Let Ss be the set of all matrices such that ^ 2||Ty||r and = 1- We need to make sure the 

null space of A has no intersection with Ss. We will first upper bound (jl6p in Theorem [1] then choose m 
(and yu) respectively. 
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As a first step, given a fixed H G M"^" we'll calculate an upper bound on f{H, Ss) = svipy^^g^ vec{H)'^ vec{W) 
sup^ 5^ {H,W). Note that from Lemma [D we have: 



f{H,Ss)= sup {H,w) < sup j:{h)^j:{w) 

W&Ss W&Ss 



(23) 



The careful reader will notice that actually we have equality in (j23p because the set Ss is unitarily 
invariant hence any value we can get on the right hand side, we can also get on the left hand side by 
aligning the singular vectors of H and W. Let h = S(i?), w = S(VF). Note that h, w ^ 0. Then 

since ||w||f2 = ll^ll-F = 1 J27=i "^i — Y17=n-r+i'^i ^ ^ ^s, we need to solve the following 
optimization problem given H: 

max h'^y (24) 
y 

subject to 

y ^0 

n n—r 
i=n— r+l i=l 

\\y\\h < 1 

Clearly the right hand side of (j23p and the result of (j24p is same because h^y will be maximized when 
{yi}f=i sorted increasingly due to Lemma [TJ 

Note that dMD is exactly the same as (10) of Then we can use (22), (29) of [H] directly to get: 

Lemma 7. If h^z > then 



fiH,Ss)< 



\ 



Y ^2 ((h^^) - E-=i hi)' 



(25) 



i=c+l 



where z G M" such that Zi = 1 V 1 < i < n — r and Zj = — 1 V n — r + l<i<n and < c < n — r 
such that (h'^z) — Yli=i > {n — c)hc. As long as h^z > we can find such c > 0. In addition, in order 
to minimize right hand side of i25\) . one should choose largest such c. 

In case of h^z < 0, the following is the obvious upper bound from Cauchy-Schwarz and the fact that 
\\W\\f = 1 



f{H,Ss) < \\h\U, 



\ i=l 



(26) 



Similar to [18], for the escape through a mesh (ETM) analysis, using Lemma [71 we'll consider the 
following worse upper bound: 

Lemma 8. Let z be defined same as in Lemma^ Let H be chosen from Q{n,n) and let h = 'S(H) and 
f{II, Ss) = sup^r^g^ {H, W). Then we have: f{II, Ss) < Bs where 



Bs 
Bs 



Ihll 



tfgiH,Cs)<0 



\ 



j = Cs + l 



n 



where g{II, c) 



hr and = 6^n is a c < n — r such that 



Cs = i/E[h^z]<0 
Cs is solution of (1 — e) 



E[(h^^)-E-=i/^^ 

y/n{n — c) 



n 



else if E[h^z] > 



(27) 



6 



where e > can be arbitrarily small. Note that Cg is deterministic. Secondly one can observe that 
c,>0 E[h^z] > E[h'^z - h,] > 0. 

Here F{-) is the c.d.f. of the quarter circle distribution previously defined in ([6|). 
4.1.1 Probabilistic Analysis of E[i?5] 

The matrix H is drawn from G{n, n) and ^[i?^] > E [/(//, Ss)]- In the following discussion, we'll focus on 
the case E[h-^z] > and we'll declare failure (no recovery) else. This is reasonable since our approach 
will eventually lead to = 1 in case of E[h^z] < 0. The reason is that, with high probability we'll have 
g{H,Cs) < and this will result in E[i?5] E[||ff||j7] which is the worst upper bound. 

Then, we'll basically argue that whenever E[h^z] > 0, asymptotically with probability one, we'll have 
g{H,Cs) > 0. Next, we'll show that contribution of the region g{H,Cs) < to the expectation of Bg 
asymptotically converges to 0. 

From the union bound, we have: 

FigiH, cs) < 0) < P(h^z -f2h,<{l- e)E[(h^z) - h^]) + ¥{h,^ > ^p-^ (^^T^) ) (^8) 

2 = 1 i=l V / 

We'll analyze the two components separately. Note that h"^z — X]i=i ^ function of singular values 
which is actually a Lipschitz function of the random matrix H as we'll argue in the following lemma. 

Lemma 9. Let H S M"^" and let h = Ti(H) and z is as defined previously. Then: 

Cs 

f{H) = h^z - ^ hi (29) 

i=l 

is \Jn — Cg Lipschitz function of H . 

Proof. Let H, H, H be such that H = H — H . From Lemma ([2]) we have: 

ri—Cs r n—Cs 

\\H\\n-c. > W'-^{H) - o,{H)\ > I Y.{a,{H) - a,{H))\ + | {a,{H) - a,{H))\ (30) 

1=1 1=1 j=r+l 

>|h^z-h^z| = |/(i?)-/(^)| (31) 

On the other hand we have: ||i7||„_c5 < ^/n — Cs\\H\\f which implies \f{LL) — f{H)\ < ^Jn — Cg \\H\\f 
finishing the proof. ■ 

Now, using the fact that H is i.i.d. Gaussian and h is the vector of singular values of H, we have 
E(h^z - Y^iLi hi) = (7(1) - 27(/3) - (7(1) - 7(1 - 6s)) + o{l))n^/'^ hence from Lemma [5] and from the 
fact that H is i.i.d. Gaussian, we have: 

P(h^z - £ , (1 - .)E[(h-z) -£..]), exp(- -^(^(^-^-)^-;^(f)+-W)^-^ (32) 
1^1 tt 2(1 - 6s) 

if E[h'^z] > (which is equivalent to E[h^z - Y^l'^^ hi] > and 6s > 0). 

Similarly from the quarter circle law we have E(/ic) = {P^^{c/n) + o{l))^/n. Using Lemmas El [H we 
can find: 

P, := P(/... > VT,F-^ ) < e.p(-| (f- ((llilii) - + o(l))') (33) 
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In particular we always have F~^{{1 + e)/3) - F~^(/3) > ^ for any e>0, l>/3>0 (because F{x) < | 
for < X < 2). Hence P2 converges to exponentially fast. One can actually show P2 < exp(— 0(n^)) 
instead of exp(— 0(n)) however this won't affect the results. 

Then since 6s > 0: P{g{H,Cs) < 0) < Pi + P2 < exp(— |(7re(5s + o(l))^). It remains to upper bound 
^{Bg) as follows: 



nBs] < [ 

J n 



g{H,c,)<0 



\h\\e^p{H)dH + 



i=Cs+l 



n - Co 



(34) 



Note that g{H, c) is linear function of h (hence H) so if g{H^ c) < <;=^ g{o-H, c) < for any a > 0. In 
other words similar to the discussion in [T8] for any value of a = the fraction of the region g{H, c) < 

on the sphere of radius a will be constant. On the other hand since H is iid Gaussian, the probability 
distribution of H is just a function of i-e. p{H = Hq) = f{\\HQ\\p) = (27r)~" exp(— i||//o|lF) for 

any matrix Hq £ M"^". As a result: 



(IH = Co 



9iH,Cs)<0,\\H\\F=a 



dH — CoSa 



\H\\F=a 



"■^"^ with radius a. Hence 
p{H)dHda -- 



a>0 J g{H,Cs)<Q,\\H\\F=a 



where Sa is the area of a sphere in 
P{g[H,Cs)<Q) = 

= Co [ f{a)Sada = C 

Ja>0 

Using the exact same argument: 

\\H\\Fp{H)dH 

g(H,Cs)<0 



a>0 Jg{H,Cs)<0,\\H\\F=a 



(35) 



f{a)dHda (36) 







(37) 



a=0 J g{H,Cs)<a,\\H\\F=a 



\H\\Fp{H)dHda 



af{a)dHda 



a=0 Jg{H,Cs)<0,\\H\\F=a 
00 



poo 

/ afia)CoSa = P{g{H, c,) < mmWr) 

Ja=0 



n 



< exp(-^(^e5, + o(l))2)n 
o 

The last term clearly goes to zero for large n. Then we need to calculate the second part which is: 



\ i=Cs + l 



E( 



i=Cs+l 



n - c. 



< 



E( 



^ * n - c. 



i=Cs + l 



(38) 

(39) 
(40) 



The last inequality is due to the following Cauchy-Schwarz. For a random variable (R.V.) X > 



E(X) 



xp{x)dx I p{x)dx > 



J y^xp{xfdx^ =E(\/X)^ 



Note that for large n and fixed Cs = dgn and r = f3n we have 

l=Ca + l ^ 



(41) 



(42) 
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Then combining (|34p and ()38p , it follows that ()42p gives an upper bound for E[Bs]^ and thereby E[/(ii", S^)]^. 
To be able to calculate the required number of measurements we need to find 6s and substitute in (H2]l 
because (j42p will also be an upper bound on the minimum m asymptotically. 
If we consider ([27|) . asymptotically 6s will be solution of: 



Ml - 6s) - 2j{/3) 



F~\{l + e)6s) 



(43) 



Then we can substitute this 6s in (j42p to solve for m (and //). Using Theorem [T] and (j43p we find: 
Theorem 2. //7(1) — 27(/3) < i/ien jj. = 1. Otherwise: 

(7(l-5.)-27(/3))2 



/i > 72(1 - 5s 



1 



(44) 



is sufficient sampling rate for f3 to be strong threshold of random Gaussian operator A : M"^ 
Here 6s is solution of: 



^7(l-5.)-27(/3) 



1 



F~\{l + e)6s) 



(45) 



In order to get the smallest we let e — )• 0. Numerical calculations give the strong threshold in Figure 
[TJ Obviously we found and plotted the least /x for a given /? (i.e. equality in 

Next we define and analyze sectional threshold. 



4.2 Sectional Threshold 

Sectional recovery threshold. Let A : R"^" — )• be a random Gaussian operator and let {P,Q} be 
an arbitrary orthogonal projection pair with rank(P) = rank{Q) = f3n. Then we say that (3 (0 < (3 < 1) 
is a sectional recovery threshold if with high probability A satisfies the following property: 
Any matrix X with support {P, Q} can be recovered from measurements A{X) via (0). 

Given a fixed /?, our aim is to calculate the least /x such that /3 is sectional threshold for a random 
Gaussian operator A : M"^" — ^ M^" . 

Lemma 10. Given support {P, Q} with rank{P) = rank{Q) = r one can recover all matrices X with this 
support using (0) iff for all W G N{A) we have 



\\{I-P)W{I-Q'^)\U>\\PWQ^l 
Proof. Note that in a suitable basis induced by {P, Q} we can write: 



where 



PWQ'^ 



'Wii 0" 


(I - P)W{I - Q^). Now If ([46 



X = 

Wu 




Xn 




W 



W21 W22 



(46) 



(47) 



PWil - Q^), 




W21 



(I - P)WQ^ 





W22 



\x+w\ 



Xu + Wu Wi2 
W21 W22 



holds then using Lemma [3] we immediately have for all W £ M{A): 

> \\Xn + Wn\U + \\W22\U > \\Xn\U-\\Wu\U + \\W22\U > ll^nlU (48) 



Hence X is unique minimizer of ([2]). In [12J, it was proven that (j46p is tight in the sense that if there 
exists W G 7V(^) such that ||(/ - P)W{I - Q^)\U < \\PWQ^\\^ then we can find an X with support 
{P, Q} where X is not minimizer of (l2|). ■ 
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Now we can start analyzing the sectional null space condition for the NNM problem. ^ is a random 
Gaussian operator and we'll analyze the linear regime where m = /xn^ and r = (5n. Similar to compressed 
sensing, the null space of ^ is an — m dimensional random subspace of distributed uniformly in 
the Grassmanian w.r.t. Haar measure. Then similar to |18j we have established the necessary framework. 

Let Ssec be the set of all matrices such that ||(/ - P)W{I - Q^)|U < WPWQ'^W^ and = 1. 

We need to make sure, the null space has no intersection with Ssec- We will first upper bound (fTHl) in 
Theorem [H then choose m (and /i) respectively. As discussed in [12j. without loss of generality we can 
"^11 o" 

assume X = ^ ^ because X can be transformed to this form with a unitary transformation (which 

depends only on {P, Q}) and since the null space is uniformly chosen (i.e. its basis is — m random 
matrices chosen iid from Q{n,n)) after this unitary transformation its distribution will still be uniform. 
The reason is that if X is i.i.d. Gaussian matrix and A, B are fixed unitary matrices then AXB is still 
i.i.d. Gaussian. This further shows that the probability of successful recovery does not depend on {P, Q} 
as long as /3 is fixed. With this assumption Ssec is the set of all matrices with ||Ty22|U < ||l^ii|U and 
\\W\\f = 1. Observe that Wu G M''^'^ and W22 G m("-^) ><("-'■). 

In the following we assume 2x2 block matrices. Let Xij be i'th row and j'th column block 
of X. As a first step, given a fixed H G M"^" we'll calculate an upper bound on f{H,Ssec) = 
snpwGSse. vec{HYvec{W) = suvw<,Ss.. i^, W). Note that: {H, W) = {Hij,W^j) 

Further let hi = S(i7ii), h2 = T,{H22), wi = S(14/'ii), W2 = T,(W22)- Also let be increasingly 
sorted absolute values of entries of submatrices Hi2,H2i and W3 is defined similarly. Finally let Xij 
denote j'th entry of vector Xj 

From Lemma [1] we have: 

3 

f{H,Sscc)= sup {H,W)< sup Vhfw, (49) 

W&Ssec W&Ssec ~[ 

Similarly one can achieve equality in inequality (I49p . although we'll not discuss here. On the other hand 
W G Ssec if and only if: 

W^ihi > ||w2||£i (50) 

We also have ||wi||^^ + ||w2||^2 + llwsH^^ = II^IIf = 1- Then we need to solve the following optimization 
problem (remember that Wj, hj ^ V i): 



max 
yi,y2,y3 



Ehfy» (51) 

i=l 

subject to 
Yi ^ V i 

llyilki > ||y2||£i 

llyilli + Ily2ll?2 + llyslli < i 

Clearly, the right hand side of (|19]) and result of dST]) is same again due to Lemma [1] because increasingly 
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sorting yj's will maximize the result. Now we'll rewrite ()5ip as follows: 



max ai + 02 (52) 
yi,y2,y3 

subject to 

ai = hf yi + hf'y2 
02 = hg ys 
^ V i 

llyilki > ||y2||£i 
llyilli + Ily2|li < El 
llyallla ^ E2 
Ei+E2<l 

Now, the question is reduced to solving the following two optimization problems and maximizing over 
them by appropriately distributing Ei, E2: 

max hf yi + hf'y2 (53) 
yi,y2 



subject to 



lyilki > ||y2||£i 
lyilli + Ily2|li < 1 



max ho y3 (54) 
ys 

subject to 

llyslll < 1 

Let result of program (f53]l be fi{H,Ssec) and result of program (l54l) be f2{H, Ssec)- Then clearly result 
of program ([52]) is 

max ai + a2 (55) 
subject to 

ai = y/Eifi{H, Ssec) 
O2 = \/E2f2{H,Ssec) 

Ei + E2<l 

It is clear that fi{H, Ssec), f2iH, Ssec) > 0. Then analyzing ([55]) we get: 

f{H,Ssec) < ai + a2 = y/E[h{H, Ssec) + y/E'2f2{H, Ssec) (56) 

< V^l + E2VfliH, Ssec)^ + f2{H, Ssec)^ < VMH, Ssec)"" + f2{H, Ssec)"" 

Similarly one can also achieve equality in (l56j) by letting = 

Now let us turn to analyzing program (j53p . Luckily |18] already gives the following upper bound for 
this in equation (94). For any HhiH^^ < ||h2||f^: 



II?, + l|h.||?, - ± kl, - (llh^ll'.-IIMk-E^.^.)- p,) 

i=l 



fl{H, Ssec) < 

for any c< n — r such that ||h2||4 — ||hi||£^ — Yli=i ^2,i > (n — c)/i2,, 
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For program ()54p we have: 

f2{H,Ssec) < hi^ys = (h3,y3> < llhsll^Jlyallfa < llhsll^, (58) 
Combining ([56]), ([57]), ([58]) we find: 



f{H,Ssec) < 



gill- - E "1. if||h,|l4<l|h2ll„ (59) 



n — c 

«=1 



< \\H\\f else (60) 

Using ([59]) . for escape through a mesh (ETM) analysis we'll use the following upper bounding tech- 
nique: 

Lemma 11. Let H be chosen from Q{n,n) and let hi = T,{Hii) and h2 = T,{H22)- f{H,Ssec) = 
supvi/g^.^^^ {H,W). Then we have: f{H,Ssec) < B^ec where 

Bsec = \\H\\f if g{H,Csec) < 



HIP y^ft2 (Ifelk -llhilk-Ea-fe,.)^ 



i=l 



else 

n - c. 



where g{H, c) = ^^'^^^^^^ — ^^^^^.Ic ^ ~ ^2.c o.nd Csec = Sgecnil — P) is a c < n(l — /3) such that 



C,ec = i/E[||h2||^J <E[||hi||£j 

Csec is solution 

y'n[l- p)[n-c) \n{i-p)/ 

(61) 

where e > can be arbitrarily small. 



„j _ ,)E|(l|h.lk. - llh.B, ) - EL. M ^ f-i C il±il^) e;,.E[||h,||,J>E[||h,||,J 

y/n(l- /3)(n-c) V"(l-P)/ 



4.2.1 Probabilistic Analysis of E[i3sec] 

In order to do the ETM analysis, we choose H from Q(n,n). Clearly E[Bsec\ > E [/(//, S'sec)] hence we 
need to find an upper bound on E[i?sec]- Similar to probabilistic analysis of strong threshold, we'll show 
that with high probability g{H,Csec) > whenever E[||h2||£i — HhiH^J > 0. We'll declare failure else. 
(Failure implies fj, = 1). Note that when E[||h2||^^ — ||hi||fj > 0, > 0. 

F{giH, Csec) < 0) < Pi + P2 where (62) 

Pi = nh2,c... > x/^r^F-^ { %-7) ) ^ ^^^^ 

Csec Csec 

P2 = F(||h2||^, - \\hi\U,-Y,h2,i < (l-e)E[||h2|k - llhill,, - J^/i2,i]) (64) 

i=l 1=1 

Remember that /i2,j is ^'th smallest singular value of the submatrix H22 which is drawn from G{n{l — 
/3),n(l — /3)). From quarter circle distribution it follows: 



nh2,c.J = x/n(l-/3)(F-i(<5,ec) + 0(1)) (65) 



Then similar to the analysis of the strong recovery using Lemmas [H [5] and the fact that H is iid Gaussian, 
we find 

Pi < exp (- ""^^"^^ vre^^.ec + o(l))') (66) 
Now we'll analyze P2 using Lipschitzness. 
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Lemma 12. Let f{H) = \\h2\\e-^ — \\hi\\i^ — Y2i=i^^,i- Then f is \Jn — Cgec Lipschitz function of H . 

Proof. Assume we have 2x2 block matrices H = H — H G M"^" with upper left block having size r x r. 
Also let ll-ff ||f = 1- Then we have 

r n—r—CgEc 

1 > ||#n||| + ||#22||| > J^<Ti(#n)'+ Yl ^^(^22)' (67) 

i=l i=l 
r n-r-Csec 

=^ ^n-c,,,>Y,^r{Hii)+ Yl ^'(^22) (68) 

i=l i=l 

Now using Lemma [2] we get: 

||-f^ll||r + ||-f^22|| n—r—Csec 

> ^|ai(i?ii) -f7i(/7ii)| + Y Wi{H22) - <yi{H22)\ (69) 

i=l 1=1 

> \f{H)-f{H)\ (70) 
Combining all we find: ^Jri^^c^c > \ f{H) — f(H)\ as desired. ■ 

Note that asymptotically E[f{H)] = ((1 - /3)^/^7(l - Ss) - /3^/^7(l) + o{l))n^/'^ because hi, h2 are 
vectors of singular values of Hn and H22 respectively. From ()12p and ([5]) we find 

^2 < {- 2(1 -t{l - 13)) ^^^ ~ ^^'^'^^^ " ~ ^'^'^^^^ + ^^^^^') ^^^^ 

Finally we showed F{g{H, Csec) < 0) < Pi + P2 decays to exponentially fast as n — )• 00. Then we use 
the following upper bound for E[Ssec]: 



E[S,ee] < / \\H\\MH)dH + [ ,\\Hy-J2hl.- ("^^"^^-"^^"^^-a^i^^--)% (g)rfg(72) 

Using exactly same arguments in p8l and (I38p we have: 

/ \\H\\Fp{H)dH < exp(- ""^^ ~ {Tre6s + o{l) f)n ^ (73) 

as n — )• 00. Secondly using a/e[xJ > E[\/X] for any R.V. X > we have: 



E 



.fjn (llhalk. -||hi|k-ESi'ft2,)^ 



. , n-Cs 



(74) 



< nW 1 - (1 - /3)2(1 - 72(1 - 5.ec)) 1 - s,Ul - P) ^ 



Combining (I73p . (I74p we find that asymptotically the right hand side of ()74p is an upper bound for 
E[i?sec]. Using this we can conclude: 

Theorem 3. If (3 > \ then = 1. Otherwise: 

1 - dseci^ - P) 
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is a sufficient sampling rate for (3 to he sectional threshold of Gaussian operator A 
< 5sec < 1 «s solution of (from ^61\)) 



' , where 



(77) 



^/r^(i_5(i_/3)) 

In order to find the least /i we let e — t- 0. Numerical calculations result in sectional threshold of Figure 

©• 

4.3 Weak Threshold 

In this section, we'll derive the relation between /i and /3 for the weak threshold described below. 

Weak recovery threshold. Let A : R"^" — )• R"^ be a random Gaussian operator and let X G R"^" be 
an arbitrary matrix with rank{X) = (5n. We say that (3 is a weak recovery threshold if with high probability 
this particular matrix X can be recovered from measurements A{X) via program 

We remark that the weak threshold is the one that can be observed from simulations. The strong 
(and sectional) thresholds cannot because there is no way to check the recovery of all low rank X (or all 
X of a particular support). In this sense, the weak threshold is the most important. 

Again given a fixed /3, we'll aim to the least ji such that (3 is weak threshold for a random Gaussian 

2 

operator A : R"^"" — >■ R'^"' . In order to prevent repetitions, we'll be more concise in this section because 
many of the derivations are repetitions of the derivations for strong and sectional thresholds. 



Lemma 13. Let X G R"^" with rank{X) = r, SVD X = UT.V'^ with S G R''^''. 
using ^ iff for all W G ^{A) we have 

trace{U^WV) + \\U^WV\\^ > 

where JJ, V such that [U U] and [V V] are unitary. 

Proof. Since singular values are unitarily invariant , if (I78p holds using Lemma 

'U^WV 



Then it can be recovered 



(78) 



X + W 



[u uYix + w)[v v]i 



S 




u^wv 



> ||S + U^WV\U + \\U^WV\U > trace(S + U^WV) + \\U^WV\ 

> \\X\U + traceiU^WV) + \\U^WV\U > \\X\U 



(79) 

(80) 
(81) 



Hence X is unique minimizer of program ([2]). In p^, it was shown that condition (I78p is tight in the 
sense that if there is a G Af{A) such that trace{U'^WV) + WW^WVWi, < then X is not minimizer of 



Note that conditions |7(§D is independent of singular values of X. This suggests that not only X but 
also all matrices with same left and right singular vectors U, V are recoverable via 

Analyzing the condition: ^ is a random Gaussian operator and we'll analyze the linear regime 
where m = fj.n'^ and r = f3n with /3 > 0. 

Let be the set of all matrices such that tiace{U^WV) + ||C7^Tyy|U < and \\W\\f = 1. We 
need to make sure null space M{A) has no intersection with Sw We first upper bound ([T6|) in Theorem 
[H then choose m (and fi) respectively. As discussed in [12], without loss of generality we can assume 
Ti Ol 

where S G R^'^*" is diagonal matrix with positive diagonal. This is because any X = 



X 







UTiV'^ can be transformed into this form by unitary transformation [U UfXlV V] and since null space 
is uniformly chosen (i.e. its basis is — m random matrices iid chosen from Q{n,n)) after unitary 
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transformation its distribution will still be uniform. Then can be assumed to be set of matrices with 
trace(VFii) + \\W22h < and = 1- 

Again we assume 2x2 block matrices. Firstly, given a fixed H € M"^" we'll calculate an upper bound 
onfiH,S^) = supw^sjH, W). 

Let hi = diag{Hu) i.e. diagonal entries of Hu, h2 = $](i722) and let be increasingly sorted absolute 
values of remaining entries, which are entries of Hi2,H2i and off-diagonal entries of Hu. wi,W2, W3 is 
defined similarly. Also let Xij denote j'th entry of vector Xj. 

Prom Lemma ([T]) we have: 

3 

f{H,S^)= sup {H,W)< sup Vhfwi (82) 

We introduce the following notation. Let s(x) denote summation of entries of x i.e. ■s(x) = Yli^i- 
Then W G ii and only if: 

s(wi) + ||w2||^^ < and (83) 

l|wi||?2 + ||w2||| + llwslll = \\Wfp = 1 (84) 

Then we need to solve the following equivalent optimization problem given H (Note that Wj, hj ^ V 2 < 
i < 3): 

3 

max Vhfyj (85) 

yi.y2,y3 ^ 
1=1 

subject to 
y2,y3 h 

•s(yi) + I|y2lki < 

llyilli + llyalli + llyslli < 1 

Right hand side of (j82|) and output of (f85|) is same. We'll rewrite ([85]) as follows: 

max ai + a2 (86) 
yi,y2,y3 

subject to 

ai = hf yi + hf y2 
a2 = hg ys 
y2,y3 h 

s(yi) + I|y2lki < 
llyilli + I|y2||l2 < El 
||y3ll?2 ^ ^2 

Ei + E2<l 

Note that (j86|) is essentially same as (67) of pE]. Basically (j86|) has additional terms of h3,y3 and yi^i 
corresponds —y-n-r+i of |18] for 1 < i < r and y2j corresponds yj of |18] for 1 < j < n — r. Then repeating 
exactly same steps that come before Lemma (fTTI) and equation ([59|) and using (67), (68) of [E] we find: 



Lemma 14. 

f{H,S^) < 



H\\j,-Y^hh-'-Mil±^M^^.l!^ ^/.(h0 + ||h2|k>0 (87) 

i=l 

< \\H\\f else 



for any < c < n — r such that s(hi) + ||h2||£^ — Yli=i ^2,i > {n — c)/i2,( 
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Based on Lemma [T^ for ETM analysis we'll use the following lemma: 



Lemma 15. Let H be chosen from G{n,n) and let hi = diag{Hii), h2 = S(//22) CLnd f{H,Su 
supp^/g5^ {H^W). Then we have: f{H,Sw) < where 



B^ = \\H\\f ifg{H,c^)<0 
B. 



\ 



ml 



else 



1=1 



n — c„ 



where g{H, c) 



||h2||<;^+^(hi)^ 1^2,1 _ ^^^^ _ ^^jj^^i — /3) is a c < n(l — /3) such that 



c^ = z/E[||h2||^, +s(hi)] <0 

soZz^izon of (1 - ,) ^[l|h2l k+Khi) -E-=i/^2.] ^ ^-i / {l±e)c ^^^^ ^^^^^^^^^^ ^ ^^^^^^ ^ ^ 



Vn(l-/3)(n - c) 



n(l-/3) 



where e > can 6e arbitrarily small. 



Note that, when is iid Gaussian, for any /3 < 1 we have E[||h2||£i + s(hi)] > since entries of hi is 
iid Gaussian hence E[s(hi)] = and clearly E[||h2||fJ > 0. As a result c^ > too. 

4.3.1 Probabilistic Analysis of E[i?^] 

Similar to previous analysis H is drawn from Q{n,n) and in order to use Theorem [T] we need to upper 
bound E[i?^]. Using the same steps and letting f{H) = ||h2||£^ + s(hi) — Yli=i ^2,i: 



V{g{H, Cu,) < 0) < Pi + P2 where 

Pi^P(V.>y^W)F-(iJ^)) 

P2=nf{H)<{l-e)nf{H)]) 



(90) 
(91) 
(92) 



An upper bound for Pi was already given in (j66p . Also similar to Lemma (|12p one can show f{H) 
is ^/n — Cw Lipschitz function of H. Therefore g{H,Cw) will approach to exponentially fast (e"*^^"'^). 
Combining this and the same arguments prior to yields: 



E[B^] < 




+ 0(1) 



Then using Theorem [T] and ()89p we can write: 
Theorem 4. 



//> 1- (1-/3)2(1 -72(1- ,5^)) 



1-5^(1-/3) 



(1 - - 5^f 

l-5^{l-p) 



(93) 
(94) 



is a sufficient sampling rate for f3 to be weak threshold of Gaussian operator A : M"^"' — 
< < 1 is solution of 

To find the least /x we let e — )• 0. Numerical calculations result in weak threshold of Figure ([T|). 



(95) 



^" where 



(96) 
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0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 



[0, (Required Sampling) 



Fi gurc 1: Results of 1121 vs results we get by using escape through a mesh (ETM) analysis similar to 1181 (For square matrices). 
Here d is model complexity i.e. degrees of freedom of the matrix, {9 = /3(2 — /3)). This plot gives the efficiency of nuclear norm 
minimization as a function of number of samples /i. It gives at a certain /i, how much more one should oversample the content of the 
matrix to perform NNM successfully. Simulations are done for 40 X 40 matrices and program JJJl is solved with Gaussian measurements. 
Our weak threshold and simulations match almost exactly. Black regions indicate failure and white regions mean success. Due to low 
precision (40 is small), we did not include simulation results for ^ < 0.1. 



5 Thresholds for Positive Semidefinite Matrices 
5.1 Additional Notations and Lemmas 

Before starting our analysis, we'll briefly introduce some more notations and lemmas. 

Sn, denotes the set of Hermitian (real and symmetric) matrices of size n x n. Similarly S" denotes 
the set of positive semidefinite matrices. PSD stands for positive semidefinite. 

Let Ms{A) C M{A) denote subspace of null space of A which consists of Hermitian matrices. 

Denote Gaussian unitary ensemble by P(n) which is ensemble of Hermitian matrices of size n x n 
with independent Gaussian entries in the lower triangular part, where off-diagonal entries have variance 
1 and diagonal entries have variance 2. In order to create such a matrix B one can choose a matrix A 
from Q{n,n) and then let B = . 

Let X G S" with rank(X) = r. Then (skinny) eigenvalue decomposition (EVD) of X is X = UAU'^ 
for some partial unitary U G M"^*" and diagonal matrix A E M'"^''. Denote i'th largest eigenvalue of 
X by Xi{X) for 1 < i < n. Let A(X) denote increasingly ordered eigenvalues of X G i.e. A(X) = 
[Xn{X) . . . Xi{X)]. Also note that singular values of X corresponds to absolute values of eigenvalues of 
X. 

Let c = Y^l/2. If A G §" is a symmetric real matrix, define vec{A) £ to be following vector: 

Vec{A) = -^^1,1 ^2,1 ••• An,l CA2,2 ^3,2 ••• ^n,2 C^3,3 ^3,4 ••• cA„_i,„_i An^n-l cAnnf (97) 

In other words for each i we let bi = [cAi^i Aj+i^j . . . An^i\/c and we let vec{A) = [61 62 • • • ^n]"^- 
Now note that for any A, i? G §" we have 

(A, B) = Y, Ai,jBi^j = J2 + 2 J2 = vec{Afvec{B) (98) 



17 



Clearly vec{.) : — t- is bijective. Then let ivec{.) denote inverse of the function vec{.). Also it 

is clear that vec{.) is linear. 

Let r]-{X),r]+{X),r]o{X) denote number of positive, negative and zero eigenvalues of X. The triple 
{r]^{X),r]o{X),r]+{X)) is called the inertia of X. 

Similar to Q{n,n) following limits exist for a random matrix X drawn from T){n): The histogram of 
the eigenvalues of X normalized by l/-v/n converges to semicircle distribution given by: 



(j)^{x) = for - 2<x<2 (99) 

(\)s{x) = else. (100) 

For — 2 < x < 2, define the cumulative distribution function corresponding to the semicircle law as: 



Fs{x) = J ^Mt)dt (101) 
Similar to the definitions of 7(.) and 72(0 we can define 7s(/3) and 72,s(/3) for < /3 < 1: 

ls{f3) = hm (102) 

72,s(/3) = Umr,^oo ^'=' ^ (103) 
Using definitions of F(.) and it is easy to see that 

7s(/3) = ^^ for /3< 0.5 (104) 

7,(/?) = ~ for0.5</3<l (105) 

Similarly we have 

72,s(/3) = ^^ for /3< 0.5 (106) 

72,s(/3) = l- ^'^^~^^^ for0.5</3<l (107) 

We'll need the following lemmas about eigenvalues of matrices: 

Lemma 16. LetX,Y £ S", Si{X,Y) = \Xi{X) - Xi{Y)\ and let S[i]{X,Y) > S[2]{X,Y) >■■■> S[„](X,y) 
be a decreasingly ordered arrangement of {si{X,Y)}f^^. Then from fEEl we have the following inequality: 

k k 
^S[^{X,Y)<^a,{X-Y) (108) 

i=l i=l 

for any I < k < n. This is the eigenvalue counterpart of (0). 
Lemma 17. Xi{X) is 1-Lipschitz function of X (and vec{X)). 
Proof. Given X, Y we have: 

11^ - Y\\f > ai{X -Y)> S[i](X,y) > Si{X,Y) = |\(X) - \i{Y)\ (109) 

■ 

Lemma 18. If X,Y G M"^" are positive semidefinite matrices then trace{XY) > 0. 
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Proof. Let A, B £ M"^" be arbitrary square-roots of X,Y respectively. In other words A^A = X and 
B^B = Y. Since X, Y are PSD A, B exists. Then we can write: 

trace(Xy) = trace{A^ AB'^ B) = trace{AB^ BA^) = tiace^AB'^ {AB'^)'^) = \\AB^\\l > (110) 



Lemma 19. Given A, B £ S" we have following inequalities due to \24^ 

r^+{A) - <r^+{A + B)< r?+(A) + 7?+(S) (111) 

r/_(A) -7?+(S) <ii-{A + B)< ?7_(A) +7?_(S) (112) 



5.2 PSD Recovery Methods 

Now we'll state and analyze null space conditions for success of the following program which is equivalent 
to nuclear norm minimization for PSD matrices. Xq G S" be a (low rank) matrix. Then we want Xq to 
be unique solution of following program: 

min trace(X) (113) 
subject to (114) 

A{X)=A{Xo), 

X ^0 

This is equivalent to ([2]) because trace(Xo) = l^"=iAi(X) = ||Xo||* since eigenvalues and singulars are 
same for PSD matrices. Similar to previous discussion, measurement operator is random Gaussian. 

In addition to this, we'll state the results for the following program where we want Xq to be unique 
positive semidefinite solution satisfying measurements A{Xq): 

find X (115) 
subject to (116) 

A{X)=A{Xo), 

X ^0 

However we'll omit the analysis for this, because it is very similar to program ()113p and actually simpler 
to analyze. 



5.3 PSD Weak Threshold 

PSD Weak threshold. /3 is called a PSD weak threshold for random Gaussian operator A : R"^" — > 
l^^n(n+i)/2 {J gjy^n Q fixed X G S" with rank{X) = /3n, X can be recovered from measurements A{X) via 
ill3\) asymptotically with probability 1. 

For a given /3, our aim is to find the least /i < 1 so that /3 is a weak threshold for A. 

Lemma 20. Let X £ §1 be a rank r matrix with EVD X = UAU^ and A G M''^''. Then X IS unique 
minimizer of U13\) if for all W £ -^f^A) we have: 

W is not hermitian or (117) 

trace{W) > or (118) 

r]^{U^WU)>0 (119) 

where [U U] is a unitary matrix. 
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Proof. If W is not hermitian then X + is not hermitian thus not PSD. If trace(W^) > then trace(X + 
W) > trace(X) as desired. On the other hand if U^WU has a negative eigenvalue, we can write 



Y := [U UfiX + W)[U U] 



U^WU 



(120) 



which means lower right submatrix U^WU of Y (which is a principal submatrix) is not PSD. Then it 
immediately follows that Y is not PSD, because we can find a vector v G M" to make v^Yv < 0. Then 
X + W is not PSD as it can be obtained by unitarily transforming Y (i.e. \U IJ]Y\U C/]"^) which preserves 
eigenvalues. Then as long as W satisfies one of the (jll7p . X + W cannot be minimizer hence X is unique 
minimizer. ■ 

One can also give the if and only if condition for PSD weak recovery. Without proof, we'll state the 
difference from Lemma [20] 

For ah W G 7V(^) we should have: (fTTTl) or (fTTS]) or (fTTO]) or "column space of U^WU is not a 
subset of column space of tJ^WtJ''\ 

However, this last condition (in bold) would not have any affect in our ETM analysis. (Again without 
proof) The reason is that, with arbitrarily small perturbation we can make U^WU full rank, while not 
changing other properties of W at all, hence the last condition will be obsolete. 



Lemma 21. Conditions , \119\) is also sufficient to guarantee sectional recovery. In other words 



given an X = UMJ^ , if jj j? , \11^ \119\) holds for all W G N'{A), then in addition to recoverability of X, 



we can recover a, 



// PSD matrices Y with support {UU'^ , UU'^} from measurements A{Y) with /ill3\) . 



Proof Any PSD Y with support UU^ can be written as Y = Uy^yUy with UyU^ = UU^ . Now, 
assume pTTl [TTHl [TT9|) holds. Then, if we have r]_{U^WUY) > whenever r]^{U^WU) > 0, using 
Lemma (I20p we are done, because all conditions for Y become satisfied. 
As a result, it remains to show: r/_(C7yl4/'{7y) > <^=^> ri_{U'^WjJ) > 

Proof Let v G M""'' such that v^U^WtJv < 0. Then since column spaces of U and Uy are same we can 
choose V2 = UyUv so that C7yV2 = tJv vf'C/y VFC7yV2 < 0. ■ 



This result suggests that there is no need to analyze sectional condition separately because results we 
get for weak will also work for sectional. 

Now we'll start null space analysis. Let A : R"^" — )• M*" be random Gaussian operator where 
m = fm{n + l)/2 (0 < ^ < 1). In [23j it was argued that distribution Ms{A) (null space restricted to 
hermitians) is equivalent to a subspace having matrices {Di}^^^^^^"^ as basis where {Di}^^^^^^"^ 
is drawn iid from T){n). This is easy to see when we consider A{-) as a mapping from lower triangular 
entries to M™. 

This also implies distribution of Ms{A) is unitarily invariant because if D is chosen from 'D(n) then 
for a fixed unitary matrix V , VDV^ and D has same distribution (identical random variables). 

Proof. D is equivalent to {G + G'^)/^/2 where G is chosen from Q{n,n). Then VDV^ is equivalent to 
(yGy^ + (VGV'^)'^) / \/2. Now using distribution of VGV^ is equivalent to that of G we end up with 
the desired result. ■ 

Let X = UAU'^ be given where rank(X) = r = /3n. Similar to previous analysis let S^p denote the 
set of hermitian matrices W so that trace(PF) < 0, r]-{U^WU) = and = 1- Since Ms{A) is 

unitarily invariant we can assume X is diagonal and: X ^ ^ 







Now condition ri^{U'^WU) = can 
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be replaced by r/_(W22) = 0. We want to make sure that Afs{A) does not intersect with S^p so that nuh 
space condition ()117|) will be satisfied. 

Note that this can be rewritten in the following way which will enable us to use Theorem [TJ Let 
vec{J\fs{A)) be the subspace formed by applying vec{.) to elements of Ms{A). Then its distribution is 
equivalent to a subspace in M"("-+^)/2 having basis where {djjj's are iid vectors drawn 

from Q{n{n + l)/2, 1). This is because random vectors {vec{Di)}i are equivalent to {dj}j scaled by \/2. 
Similarly let vec{Swp) be the set of vectors obtained by applying vec{.) to elements of Syjp. 

Then because vec{.) is linear, Ms{A) n S^p = <^=^ vec{Afs{A)) n vec{Swp) = 0- Let h be drawn 
from G{n{n + l)/2, 1) and let D be drawn from T>{n). Since vec{.) also preserves inner products, in order 
to use Theorem ([1]) as previously we just need to calculate: 

uj{S^p) = E[ sup h^w] = sup {D,W)] (121) 

Let H be Hermitian and define f{H,Swp) = supyy fz {H,W) . Then we'll firstly upper bound 
f{H, Ssw) then take expectation of upper bound as previously. Let s(x) denote summation of entries of 
vector X. 

Let hi denote the diagonal entries of Hn, h2 = A(i/2,2) and denote increasingly ordered absoute 
values of entries of H12, H21 and off-diagonal entries of Hn. wi, W2, W3 are defined similarly for W. Note 
that W G Ssw if and only if 

W2h0 (i.e. W2,2 PSD) (122) 
s(wi) + s(w2) < (123) 
+ ||w2||| + llwallf^ = 1 (124) 



Now using Lemma ([T]) we write: 



{H,W) <h[wi + hi + {H22,W22) (125) 



Let -^2,2 = -^^2*^2 ~ ^2 2 where both of H221H2 2^0. Then from Lemma [18] we get trace(W^2-^2 2) — ^ 

and from Lemma ([T]) we get: trace(VF2^2^2^2) < Tlllf'''^^ Ai(W2,2)Ai(i^2\). Combining these we can 
find: 

V+{H2,2) 

{H22,W22)< Yl ^i(W2,2)Xi{H+2) (126) 
i=l 

Clearly Ai(ff^2) = ^{^2,2) for any i < r]^{H2,2)- Now let h4,W4 be increasingly ordered first r]^{H2^2) 
eigenvalues of -^2,2,^^2, 2 respectively. Then clearly ||vif4||^2 ^ I|w2||^2 ^^^^ ^(^4) < s(w2) since W2,2 is 
PSD. Then an upper bound for {H, W) is h^wi + hg'ws + hj W4. 

Remember that hs, h4, W2, W3, W4 ^ in the previous discussions. Then solution of the following 
program will give the upper bound for f{H, Sujp): 

max hf yi + h^^ys + hjy4 (127) 
yi,y3,y4 

subject to 
y3,y4 h 

s(yi) + s(y4) < 

llyilll + Ily3ll?2 + I|y4||?2 < i 
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Similar to this, following will also yield the same upper bound: 

max hf yi + h^ys + h^y3 (128) 
yi,y2,y3 

subject to 
y2,y3 h 
s{yi) + s(y2) < 
llyilll + Ily2ll| + llyslll < i 

The difference is h2 however largest ?7+(//2,2) entries of h2 (i.e. positive entries) gives h4 and the 
remaining entries are nonpositive. Then programs (I127P and (I128P gives the same result since in order to 
maximize h2'y2 one should set y2,i = whenever /i2,i < since we need to satisfy y2 ^ 0. In other words 
for any y2 >z 0, the vector v = [y2,i • . . y2,r]+(H2 2)^0 ... 0]"^ is feasible and it will yield better or equal 
result because we'll have: h^v > h2y2, s(v) < s(y2) and ||v||^2 ^ ||y2||^2- Consequently (I128P reduces 
to (fT271l . 

Now note that (I24p is exactly same as program (115) of [TS] except additional terms of h3,y3. Then 
using (116) of [18j and repeating the steps before ([59]) and using ||hi|||^ + ||h2|||2 + llhsUl^ = ll-f^llF we 
end up with: 

Lemma 22. 

' ^ (.(hi) + .(h2)-E-=lM' 



f{H, Swp) < 



mi - E ''ii - _ (129) 



n — c 

i=l 



for any < c < n — r such that s(hi) + s(h2) — Yli=i ^2,i > {n — c)/i2,c- If there is no such c then 
f{H, Swp) < \\H\\p 

Based on (j22p . for probabilistic analysis we'll use the following lemma: 

Lemma 23. Let H be chosen from D{n) and hi,h2,h3 are vectors as described previously. Then we 
have: f{H,Swp) < B^p where 

= II^^IIf if g{H,Cu,p) <Q 



Bwp 



|H|rJ.-V/4,-<fM±£M^miH)! else (130) 

where g{H, c) = ^(^i)^^^^^)^^^^^! ^^.i _ ^^^^ _ ^^^j^^i — /3) is a c < n{l — (3) such that 

E[,s( hi)+g(h2)-i:-=iM ^ ^-1 /^ (i+e)c 

^n{l-(3){n-c) ' U(l-/3) 

where e > can be arbitrarily small. 

Note that c^p > for any /3 < 1, since we have E[s(hi)] = and E[s(h2)] > 



c^p zs solution of (1 _ ^^ -L-n^^iy^^^ z..=i -^.J ^ ^-i | | (^3^) 



5.3.1 Probabilistic Analysis for E[i?i„p] 

Similar to probabilistic analysis for previous cases, we can use Lemma (I16p to show Lipschitzness of the 
function s(hi)+s(h2) — ^^^^^ /i2,i. Proof follows the exact same steps of Lemma ()12p . Then, using this and 
Lemmas [TTl and \5\ we can conclude that F{g{H,Cwp) < 0) decays to exponentially fast (exp(— 0(n))). 
As a result for E[i?^p] we have following upper bound by taking expectation of righthand side of (jl30p : 

nBn^p] < n^l - (1 - /3)^(72,s(l) - 72,s(l - S^p)) - ~ f ^J^' + o(l) + o(l) (132) 

Then using Theorem ([T]) and 72,s(l) = 1, we can conclude that: 
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Theorem 5. 

/z > 1 - (1 - - 72,s(l - - '-^-t^^^tP^^tP^ (133) 



_ (1 - /3)37,(1 - 6 ^j,f 
1 - (1 - /3)5.p 



is sufficient sampling rate for /3 to he PSD weak threshold of Gaussian operator A : M*^^" — )• MA"^("+i)/2. 
i/ere, due to I1131\) . 5wp is solution of: 

(^-^)'-^p2^--A^fiF-\il + .)S) (134) 

Corresponding number of samples will be m = fin{n + l)/2. Also Fs{-) is the c.d.f. of the semicircle 
distribution defined previously. 

Remember that we choose smallest such fi to plot the curves. Result is given in Figure ^ as "Trace 
Minimization Weak". 

5.3.2 Alternative Analysis 

One can also directly analyze program (jl27p because it is exactly same as (|85]) . It would give 
Lemma 24. 



f{H, Swp) < 



\ 



|2 



i=l 



for any < c < t — r such that s{hi) + s(h4) — J2i=i ^4,« ^ ~ c)/i4^c where t = t(hi, = r + ?7+(//2,2) 
is sum of dimensions of vectors hi and h4. If there is no such c then f{H, Swp) < 

Note that t is not deterministic however when H is drawn from P(n) we have E[t] = n(/3+i^) = ri^-^ 
because clearly half of the eigenvalues of i?2,2 is positive in expectation. Furthermore from Lemmas \5\ 
\T7\ it immediately follows that t will concentrate around E[t] because for any e we can write 

P(t _ !^(1±^ > n{l - P)e) = P(A„(i_^)(i/2+,)(i?2,2) > 0) < exp(-^^ii^F7i(l/2 - ef) (136) 

(note that F~^{l/2) = 0) Then asymptotically t/n will be approximately constant. Consequently we can 
probabilistically analyze Lemma (j24p in a similar manner to previous cases however we have to deal with 
more details. At the end, we get the following which comes from asymptotic expectation of righthand 
side of (fT35D : 

Lemma 25. 

is sufficient sampling rate for (5 to he PSD weak threshold of Gaussian operator A : M"^"' — )• ]R'^"("-+i)/2. 
Here 6wp is solution of 

l + ;3-(l-/3)5 -Vl-^^ (138) 
This formulation is nicer, since we did not use additional functions F^, 7^,72,5. 
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5.4 PSD Strong Threshold 



Now we'll analyze strong threshold for positive semidefinite matrices. 

PSD Strong threshold. We say (3 is a PSD strong threshold for Gaussian operator A : M"^" — >■ R™ 
(m = fin{n + l)/2), if A satisfies the following condition, asymptotically with probability 1: 

Any positive semidefinite matrix X of rank at most /3n can be recovered from measurements A{X) via 

Lemma 26. Any X G of rank at most r can be recovered from measurements A{X) via \113\) if and 
only if any W G N{A) satisfies one of the following properties: 

W is not hermitian or (139) 
trace{W) > or (140) 

ri^{W)>r (141) 

Proof. If one of the first two holds, then either X + is not PSD or trace(X + W) > trace(X) so X + W 
can not be a minimizer. On the other hand if third property holds then from Lemma (jl9p we find: 

r?_(X + W)> ri_{W) - r]+{X) > r + 1 - rank(X) > (142) 

hence X + W \s not PSD. So X will be unique minimizer if this is true for all W . 

Conversely if there is a which satisfies none of the properties, then write W = Wj^ — W- where 
W+, W- is PSD. Let X = W^. Clearly rank(X) = ri_{W) < r however X+W is PSD and tTace{X+W) < 
trace(X) hence X is not unique minimizer. ■ 

Now we'll analyze this condition similar to weak threshold for PSD matrices, r = f3n, fi = n(n + l)/2. 
Let Ssp be the set of Hermitian matrices W so that trace(VF) < 0, rj^{W) < r and HVFUi? = 1. We don't 
want Ssp to intersect with J\fs{A). Similar to previous analysis we need to calculate E[suppi/g5^p {H, W)] 
to find the minimum sampling rate which ensures that intersection will be empty with high proabability. 

Suppose G §" is fixed. Then let us calculate an upper bound Bgp of f{H, Sgp) = sup^r^s^p 
If r]^{H) < r, we'll set Bsp = ||-f^||F which is the obvious bound. 

Otherwise from Lemma (jlSp we find: 

{H,W)<{H^,W-) + {H+,W+) (143) 
Let c+ = in.m{r]^{H),ri^{W)} and c_ = min{r/_(if), 7/_ (VF)} Then from Lemma ([T]): 

{H, W) < u{H, W) := ^ Xi{H+)X^{W+) + J] KiH^)Xi{W^) (144) 
j=i j=i 

To upper bound f{H,Ssp) let us maximize u{H,W) over Sgp- Let W G Sgp then we have rj-{H) > r > 
ri_{W). 

Let hi, wi G MJ'+^^^ be vectors increasingly ordered largest r]^{H) eigenvalues of Hj^, W+ respectively. 
Similarly h2, W2 G M'' be vectors of increasingly ordered largest r eigenvalues of W-. Since r]^{H) > 
c+ and r > c_ we can write: 

hf wi + h^W2 = u{H, W) (145) 

Note that wi,W2 has to satisfy: wi,W2 >z 0, s(wi) < s(w2) and ||wi||^^ + ||w2||^2 — II^IIf — 1- Middle 
one is due to s(wi) < trace(iy+) and s(w2) = trace(VF_) and trace(VF) = trace(VF+) — trace(VF_) < 0. 
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Also we have hi, h2 ^ 0. Then fohowing optimization program wih give supy(/g5^^ u{H, W) 

max hf yi + y2 (146) 
yi,y2 

subject to 
yi,y2 >z 
s(yi) < s(y2) 

llyilli + Ily2lli < i 



Note that this is exactly same as program (j24l) . As a result we can write the following Lemma: 
Lemma 27. Let t = rj+iH) + r. 

/(^f, S„) < \\H\\f if ri-{H) < r or s(hi) < s(h-2) (147) 



/(ff,s.„)< 



ih.iii+iih.iii-t'»;,- ''"">~°"'/l:^"' (148) 



j=i 



where c < t]+{H) such that s{hi) — s(h2) — X]i=i ^ (* ~ c)/ii^c 

We'll not give the detailed ETM analysis for this case, as it requires more meticulous analysis. However 
for any r = /3n with /3 < 1/2 it is easy to show that, when H is chosen from T>{n), we'll have 

P(7?_(F) < /3n or s(hi) < s(h2)) ^ (149) 

3/2 

exponentially fast with n. The reason is that E[r/_(i7)] = n/2 > /3n and E[s(hi) — s(h2)] = ^^^(7(1) — 
7(2/3)) > 0. Similar to previous cases, using Lipschitzness of the functions and Gaussianity of H will 
yield the result. Then essentially we need to analyze (jl48p . Using exactly same arguments we can also 
show, E[i] = n(/3 + 1/2) and t will concentrate around its mean (as n — ?■ 00). As a result, except minor 
details, probabilistic analysis becomes similar to the ones before (i.e. where t is constant). At the end 
we get: 



Theorem 6. If j3 > 1/2 then /x = 1. When 13 < 1/2 we have: 

(150) 



1 

^>2 



\n ^ ^^.(9R\ (7(1 - M - 7(2/3))2 

72(1 - Osp) +72(2/?) - ■ 



2/3 + 1-5, 



sp 



is a sufficient sampling rate for 13 to be PSD strong threshold of Gaussian operator A : R"^" — )• 
Here 5sp is solution of 

^,(^ _ _ ^,1'}R\ 

F-\5) (151) 



7(1 -5) -7(2/3) _ 



2^ + 1-5 

Result is given in Figure [2] as "Trace Minimization Strong". 
5.5 Uniqueness Results 

In this part, we'll state the conditions and results for the unique PSD solution to the measurements 
without proof. They follow immediately from slight modifications of previous analysis. 
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5.5.1 Weak Uniqueness 



Uniqueness Weak threshold. Let A : M"^" — t- be a random Gaussian operator and let X be an 
arbitrary PSD matrix with rank{X) = (3n. We say that (3 is a uniqueness weak threshold if with high 
probability this particular matrix X can be recovered from measurements A{X) via program ill5\) . 

Lemma 28. Let X be a PSD matrix with rank{X) = r and eigenvalue decomposition UAU^ with A G 
]^rxt_ j'fiQ-fi X can be recovered via ill5\) if for all W € J\fs{A), V^WU has a negative eigenvalue. 

Lemma 29. Let A : M"''" ^Mn+i)/2 ^ Gaussian operator. Then /3 is a weak uniqueness threshold 

.>l-<i^ (152) 

5.5.2 Strong Uniqueness 

Uniqueness Strong threshold. Let A : M"^" — )• M"^ be a random Gaussian operator. We say that /3 
is a uniqueness strong threshold if with high probability all PSD matrices X with rank at most /3n can be 
recovered from their measurements A{X) via program ill5\) . 

Lemma 30. All PSD matrices of rank at most r can be recovered via program U15\} if and only if all 
W G Ms{A), has at least r + 1 negative eigenvalue. 

Lemma 31. Let A : M"""" MM«("+i)/2 be a Gaussian operator. Then P is a strong uniqueness threshold 
^f 

H=l if /3> 0.5 (153) 

^=i±fM (154) 

Curves for weak and strong uniqueness thresholds are given in Figure ^ as "Unique PSD Weak/Strong" . 



6 Discussion and Future Works 

In this work we classified the various types of matrix recovery, gave tight conditions for them and analyzed 
the conditions for Gaussian measurements to get better thresholds than the existing results of [4j and 
|12j . It turns out that the thresholds of [Hdl] actually corresponds to a special, suboptimal case of our 
analysis. In Lemmas [8l [TH [TSl instead of choosing 6s,6sec,Sw carefully if we just set them to 0, we'll end 
up with results of [?IT^2J. This suggests that, although analysis of this paper is more tedious, it is strictly 
better than previous ones and also generalizes them. 

Although we didn't do much argument about tightness of our results, actually most of the estimations 
and inequalities that are used for upper boundings are tight or asymptotically tight. In particular we 
believe our weak thresholds are exact, similar to the significant results of [18]. A key to the results of 
the paper is the fact that we have written down the null space conditions in their most transparent form. 
Essentially, the null space vectors of compressed sensing are replaced by the singular values of the null 
space matrix in NNM. This allowed us to use the approach of [18] directly. This furthermore suggests 
that the NNM problem is a generalization of compressed sensing and the two problems are very similar 
in nature. 

Our simulation results support our belief that our weak thresholds are tight. Also simulation results 
and theoretical curves suggest that at most 3 times of oversampling is necessary for weak recovery for any 
< /3 < 1, and around 8 times is required for strong. This is important as it means one can solve the RM 
problem via convex optimization with a very small sampling cost. Furthermore, although our results are 
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Figure 2l Results for PSD matrices. Again oversampling"^ {^//^) vs ^ is plotted. Simulations are done for 40 X 40 matrices and 
program l|113|l is solved with Gaussian measurements. Although resolution is low (simulations are not fine), it is not hard to see that 
trace minimization weak threshold looks consistent with simulations. Black and white regions mean failure and success respectively. 

in the asymptotic case (r = f3n), theory and simulation fits almost perfectly even for a relatively small 
matrix of size 40 x 40. This suggests that actually concentration of measure happens pretty quickly. 

It would be interesting to calculate the limiting case of /3//i as /? — )• 0, to get an estimate of the 
minimum required oversampling when the rank is small. Secondly, we believe it might be possible to 
employ these methods not only for the linear region where rank r = (3n but for any case such as r = 0(1) 
or r = 0{log{n)). Such a study might give a small (~ 3, 4) minimum oversampling rate. Although recent 
results of [19] showed we need only 0{rn) samples for recovery (which is minimal), the constant is not 
known. 

Finally, our result suggests a significant performance difference between trace minimization and unique 
solution in the special case of PSD matrices. Although we'll not argue the reason here, it is actually quite 
intuitive. Uniqueness results suggests that one needs to sample at least half of the entries (n(n + l)/4 
samples for PSD) to make sure that positive semidefinite solution is unique. Clearly such a result might 
be interesting to know but it is not useful at all. 

Our first aim will be verifying our tightness claims. In order to do this, one needs to investigate the 
work of [15j better and to come up with the conditions on the "mesh" where ([1]) is tight. 

Comment: Overall, the results of |18) established a powerful way to analyze some important ques- 
tions in low rank matrix recovery. 
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